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ABSTRACT 

We present improved numerical approximations to the exact Poissonian confidence limits for 
small numbers n of observed events following the approach of Gehrels (1986). Analytic de- 
scriptions of all parameters used in the approximations are provided to allow their straightfor- 
ward inclusion in computer algorithms for processing of large data sets. Our estimates of the 
upper (lower) Poisson confidence limits are accurate to better than 1 % for n < 100 and values 
of S, the derived significance in units of Gaussian standard deviations, of up to 7 (5). In view 
of the slow convergence of the commonly used Gaussian approximations toward the correct 
Poissonian values, in particular for higher values of S, we argue that, for n < 40, Poissonian 
statistics should be used in most applications, unless errors of the order of, or exceeding, 10% 
are acceptable. 
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1 INTRODUCTION 

The need to assess the statistical significance of an observed small 
number of events is common in astrophysics as well as in virtually 
all other natural sciences. The number of neutrinos detected in an 
underground detector, the number of supernovae observed alz > 1, 
the number of photons in a faint X-ray point source — all of these 
are numbers likely to be in the Poisson regime, and knowing the 
correct errors of these measurements is obviously crucial to any 
scientific conclusions drawn from these numbers. 

In a much noted paper Gehrels (1986, from here on G86) pro- 
vided analytic approximations to the correct Poisson confidence 
limits for small event numbers. A particulary useful new and im- 
proved approximation in G86 is for the Poisson lower confidence 
limit, \i (see Section|2|below for an overview of the nomenclature 
used). However, two issues limit the applicability of these approxi- 
mations to large data sets and/or applications that require high con- 
fidence levels. First, Gehrels' analytic approximation to the correct 
value of A; uses empirically determined parameters /3 and 7, both 
of which are non-trivial functions of S, the desired significance 
in units of Gaussian standard deviations. G86 tabulates P{S) and 
7(5) for ten values of S ranging from one to 3.3a but does not pro- 
vide an analytic formula that would allow the reader to compute A; 
for arbitrary S values^. Second, the validity of all approximations 
presented and discussed in G86 has only been verified in the same, 
relatively narrow range of confidence levels from 1 to 3.3a. To our 
knowledge, no reasonably accurate approximations have been pub- 
lished for S > 3.3a. 

^ Especially relevant in view of the fact that 7(5") actually becomes singu- 
lar close to S = 1, as we show in Sectionlsl 



While these issues may be of limited importance for many, if 
not most, applications, they can become serious in cases where con- 
fidence limits need to be computed for large sets of event numbers, 
particularly if high confidence levels are required. Consider, for in- 
stance. X-ray astronomy, a traditionally photon starved line of ob- 
servational research. With large-area, high-resolution X-ray CCD 
detectors now in use on board the Chandra and XMM-Newton X- 
ray Observatories, X-ray images of dimensions 1024 x 1024 or 
even 4096 x 4096 pixels have become common. The numbers of 
photons registered in the vast majority of these pixels will be in the 
Poisson regime, and assessing accurately the significance of any 
features embedded in the very low and spatially non-uniform back- 
ground measured with these large arrays requires the accurate com- 
putation of a considerable number of Poisson confidence limits. A 
real-life example of a scientific project relevant in this context is 
the compilation of a statistical sample of unresolved (single-pixel) 
point sources detected at the greater than 5a confidence level (i.e., 
S = 5) in a set of Chandra ACIS-I images. 



It is for applications like the ones outlined above that we here 
present modified and improved versions of G86's approximations 
that are more accurate over an extended range of confidence lev- 
els {S < 7a). To permit these approximations to be incorporated 
straightforwardly into computer algorithms for the processing of 
large data sets, we also provide analytic descriptions of all param- 
eters used, thus allowing the computation of accurate Poisson con- 
fidence limits for a wide range of event numbers and confidence 
levels. 



2 Harald Ebeling 



2 NOMENCLATURE 

In the following we adopt Gehrels' nomenclature and definitions. 
Specifically, upper limits Am and lower limits A; are, for Poisson 
statistics, defined by 



E 

and 

n-l 
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A" e- 



= 1-CL 



= CL 



n > 



n > 1 
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where n is the number of events observed, and CL is the desired 
confidence level (Eq. 1 and 2 of G86). (For n = 0, A; =0 for all 
values of CL). For Gaussian statistics (i.e. a probability distribu- 
tion which is normal) CL is related to S, the equivalent Gaussian 
nimiber of a, by 



CL(S') 



1 



27r 



dt. 



(3) 



For ease of presentation we shall use S to parametrize a large range 
of CL values. 

For n = to 100 and selected values of S ranging from 1 
to 3.3 G86 tabulates A„ and A; as obtained from Eqs. 1 and 2; 
Gehrels also presents analytic and numerical approximations accu- 
rate to better then 2% for S* < 3.3. In this paper, we test the Gaus- 
sian approximation as well as the ones discussed in G86 over a 
larger range of confidence levels (S < 7). We then modify Gehrels 
approximations to improve their performance specifically for large 
values of S and, finally, present numerical (polynomial) descrip- 
tions of all parameters used, to allow the computation of approxi- 
mate values of A^ (n, S) and A; (n, S) for all values of n up to at 
least 100 and S' < 7. 



3 THE GAUSSL4N APPROXIMATION 

For a normal probability distribution (i.e., in the limit n (x) A„ 
and A; are given by the well known expressions n ± S^/n which 
are also conmionly used to approximate the not straightforwardly 
computable Poisson limits for 'reasonably' large values of n. What 
'reasonable' means in this context is subject to debate; in practice, 
values of n in excess of 20 (and sometimes even n> 10) are often 
deemed sufficiently large to justify the use of the Gaussian approx- 
imation. 

In Figure 1, we show the percentage error of the Gaussian ap- 
proximation n ± S^/n, when applied to event numbers in the Pois- 
son regime (loosely defined as n < 100). For low to moderate 
confidence levels (S < 2) the Gaussian approximation is accurate 
to better than 10% for n = 20 (but not for n = 10!), which may 
or may not be sufficient for a given application. However, the er- 
ror of the Gaussian approximation, in particular for the lower limit, 
increases rapidly for higher confidence levels. Already at 5* = 3, 
n > 45 is required to limit the error in A; to 10%; for S = 5 even 
n = 100 is insufficient if 10% accuracy are sought. 10% accuracy 
may not nearly be good enough though in applications where errors 
of many independent measurements are propagated. In the case of 
X-ray spectral fitting, for instance, theoretical models are fitted si- 
multaneously to events registered in hundreds of independent en- 
ergy channels. A systematic error of the order of 10% in the errors 



on the counts in all spectral bins introduced by the use of Gaus- 
sian approximations to the true Poisson errors can lead to best-fit 
parameter values that may be erroneous by significanly more than 
the formal, statistical errors. 



4 APPROXIMATION OF THE POISSONIAN UPPER 
LIMIT 

G86 derives two analytic approximations to the true Poissonian up- 
per limit, namely 

13 
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(G86 Eq. 10) 



where the latter expression is simply the previous one expanded 
and limited to the dominant terms in (n +1). 

In Figure 2 we show the percentage error of these two approx- 
imations for a range of n and S values. Both expressions represent 
a considerable improvement over the Gaussian approximations (cf. 
Fig. 1). At low values of n the remaining error of several per cent 
may, however, still be too high for certain applications. Following 
the approach taken by Gehrels to improve his approximation to A; 
(see the following Section) we therefore apply a heuristic correc- 
tion to G86 Eq. 9 in the form of an additional term 6 (n -|- 1)'': 



A„ w (n 1) 
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+ 6(n-hl) 



•(4) 



We determine h = b{S) from the requirement that the above ap- 
proximation be an identity for n = 0, i.e., 

,(0, 5)^/^-1-^1/9 -5/3, (5) 



b{S) 



thus forcing better performance for low values of n. For each value 
of S from to 7, c = c (5) is then chosen such that the error 
of the above approximation is minimized for < n < 100. The 
runs of b and c as functions of S are shown in Figure 3. We find 
c to be negative for all values of S with one-sided singularities at 
the roots of Eq. 5, which lie at Sq^i = 0.50688 (corresponding to 
a confidence level of 69.388%) and 5o,2 = 2.27532 (confidence 
level 98.856%). 

To allow the evaluation of Eq. 4 for any number of observed 
events, n, and any confidence level, S, specifically in the proximity 
of the mentioned singularities, we fit piecewise polynomial func- 
tions to b (S) and c (1/S), or c (logj^Q S), with the degree of the 
polynomial being determined by the requirement that the absolute 
of the residuals be less than 1% over the full S range of the fit, 
except at the locations of the mentioned singularities. We find ac- 
ceptable poljmomial descriptions of b (S) and c (S) as follows: 
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(7) 
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with— 10 < c[S) < overriding the above definition where neces- 
sary, and coefficients bi, ci,i, C2,i, C3,i, and C4,i as listed in Table 1. 
The results of these fits are shown as the solid lines in Fig. 3. 

Figure 4 shows the relative, absolute errors of Eq. 4 and 
demonstrates that our approximation is accurate to better than 0.5% 
for all values of n and S considered here. 



5 APPROXIMATION OF THE POISSONIAN LOWER 
LIMIT 

As evidenced by Fig. 1 the Gaussian approximation A;(n, S) w 
n — is a poor one for all but the lowest values of S. G86 
explores the behaviour of several more sophisticated analytic ap- 
proximations before resorting to modifying the most promising of 
them by adding a heuristic power law term (3 (we used the same 
approach in the preceding section to improve the approximation to 
A„): 
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(G86 Eq. 14) 



To find P {S) and 7 (S) we proceed similarly as before for our 
approximation to Xu and define /? (S) as 



f{S)=Xi{l,Sy/^ -1 + 1/9 + S/3, 



(8) 



and then determine 7 (S) such that the error of the above approx- 
imation is minimized for < n < 100. The result, f3 and 7 as 
functions of S, is shown in Figure 5 which, in the overlap region, 
agrees with Fig. 1 of G86. 7 is negative for all values of S with a 
one-sided singularity at the only root of Eq. 8 at So = 0.93876, 
corresponding to a confidence level of 82.607%. 

To facilitate the evaluation of Eq. 14 of G86 for a wide range 
of values of n and S, we attempt to find analytical expressions for 
f3 (S) and 7 (S) (G86 quotes the values of either function only at 
10 locations between S = 1 and S = 3.291). In analogy to the ap- 
proach taken in the preceding section, we fit piecewise polynomial 
functions to /3 (S), as well as to 7 (S), 7 (l/S), or 7 (log^o S), 
with the degree of the polynomial being determined by the require- 
ment that the absolute of the residuals be less than 2% over the full 
S range of the fit (less than 0.1% at the high-S end where high ac- 
curacy is critical). We find acceptable polynomial descriptions of 
P (S) and 7 (5) as follows: 
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So<S< 2.7 (10) 
S > 2.7 



with —50 < 7 (S) < overriding the above where necessary, and 
coefficients /32,i, 7i,i, 72,1, and 73, i as listed in Table 2. The 
results of these fits are shown as the solid lines in Fig. 5. 

Equation 14 of G86 indeed yields greatly reduced errors when 
compared to the exact values of A;. For n < 100 and 1 < 5 < 
3.291 G86 quotes an accuracy of better than 2% for the above ap- 
proximation. Figure 6 confirms this, but also demonstrates that the 
errors become unacceptably large (> 10%) for higher confidence 
levels and small to moderate values of n. 

We now attempt to improve on Eq. 14 of G86 by adding a 
second, higher-order correction term. As illustrated by Fig. 6 such 



an additional term would have to improve the performance of the 
approximation particulary in the high-S regime for which Eq. 14 of 
G86 was not optimized. This goal can be achieved by introducing 
a (totally ad-hoc) sinusoidal term which adds only one additional 
parameter S: 



-!- + +/3n^ +<5 sin 
9n 



n + 1/4 2 



.(11) 



Since the sinusoidal term, by design, vanishes for n = 1, Eq. 8 
still holds, and continues to define B (S). 7 {S) (slightly different 
from the one determined from Eq. 14 of G86) and S (S) are again 
obtained by iteratively minimizing the absolute error of the approx- 
imation for < n < 100. With f3 (S) unchanged and 7 (S) virtu- 
ally indistinguishable from the data shown in Fig. 5 we can focus 
on 5 (S) which shows a complex behaviour (Fig. 7). We do not at- 
tampt to model the run of 5 {S) for small values of S where the 
function remains close to zero. Instead, we fit a high-order polyno- 
mial to the high-S end and set 5 [S) = for S < 1.2: 



HS) = 







S < 1.2 
S'> 1.2 



(12) 



with coefficients Si as listed in Table 3. The results of these fits are 
shown as the solid line in Fig. 7. 

Figure 8 demonstrates that Eq. 1 1 provides an approximation 
to Xi that is accurate to better than 1 % when the polynomial fits to 
13 (S), 7 (S), and 6 {S) (Eqs. 9, 10, 12) are used, except f orn = 1 
where an error of just over 1% is observed. 



6 SUMMARY 

The Gaussian approximation A^ ~ n + S^/ri, to the true Poisso- 
nian upper confidence limit is acceptable for low confidence levels 
(S < 3a) and n > 40, but becomes increasingly inaccurate for 
higher values of S. The situation is worse for the Gaussian approx- 
imation Ai « n — S^/n to the true Poissonian lower confidence 
limit, which is off by more than 10% at = 5 even at n = 100. 
The approximations proposed by G86 greatly improve upon the 
Gaussian estimates but are still inaccurate at the 10% level for low 
values of n and high confidence levels. 

Building on Gehrels' work we present improved algebraic ap- 
proximations which reduce the error with respect to the true Pois- 
sonian confidence limits to under 1% for 5* < 7 (Poisson upper 
limit) and S < 5 (Poisson lower limit). Although we have tested 
these equations only for n < 100, their analytic behaviour suggests 
that they hold for all values of n (cf. Figures 4 and 8). 

To allow the numerical computation of approximate Poisso- 
nian confidence limits for arbitrary combinations of n and S within 
the quoted ranges, we provide the coefficients of piecewise polyno- 
mial fits to all parameters used in the definition of either approxi- 
mation. 

All figures of this paper were produced using the Interactive 
Data Language (IDL); the IDL source code of the approximations 
poisson_uplim (Eq. 4) and poisson.lolim (Eq. 11) is 
available from the author 

HE gratefully acknowledges financial support from NASA 
LTSA grant NAG 5-8253 and NASA ADP grant NAG 5-9238. 
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Table 1. Coefficients of polynomial fits to 6 (5) and c (5) of Eq. 4, as 
defined in Eqs. 6 and 7 
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Table 2. Coefficients of polynomial fits to /3 (S) and 7 (S) of G86, Eq. 14, 
as defined in Eqs. 9 and 10 
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Table 3. Coefficients of polynomial fits to /? (5), 7 (5), and <5 (5) of 
Eq. 1 1, as defined in Eqs. 9, 10, and 12 
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Figure 1. Percentage error of the Gaussian approximations \u ^ n+S \/n 
(top) and A; Ri n — 5 y/n (bottom) as a function of n. In each panel the 

S values of the shown curves vary from lower left to upper right in steps of 
0.5 as indicated; the thick lines correspond to S =1, 2, and 3. Note how, for 
n = 10 (marked by the dotted line), the errors still reach and exceed 10% 
for all but the lowest values of S. 
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Figure 2. Percentage error of the approximations of G86 Eq. 9 (top) and 
G86 Eq. 10 (bottom) as a function of n. In each panel the S values of the 
shown curves vary from 1 to 7 as annotated. For essentially all values of 
n and S explored here the error of either approximation remains below the 
10% level for n > 2, and below 1% for n > 35. 
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Figure 3. Run of parameters b (top) and c (bottom) of Eq. 4 as a function 

of S, the equivalent number of Gaussian a. c exhibits singularities at 5 = 
0.507 and S = 2.275 where b = 0. The bullets mark the locations at which 
b (S) and c (S) were computed, the solid lines mark polynomial fits to the 
data (see text for details). 



8 Harald Ebeling 



1.0000 F 




Figure 4. Percentage error of the approximations of Eq. 4 witli h (S) and 
c (5) as computed (top), and using tlie polynomial fits of Eqs. 6 and 7 (bot- 
tom), as a function of n. In each panel the S values of the shown curves 
vary from 0.5 to 7 as annotated. For all values of n and S explored here the 
error remains below the 0.5% level. 
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Figure 5. Run of parameters f3 (top) and 7 (bottom) of G86 Eq. 14 as a 
function of S, the equivalent number of Gaussian a. 7 exhibits a singularity 
at S = 0.939 where /3 = 0. The bullets mark the locations at which 3 {S) 
and 7 (5) were computed, the sohd hnes mark polynomial fits to the data 
(see text for details). 
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Figure 6. Percentage error of the approximation to A; given by G86 Eq. 14 
with /3 (5) and 7 (S) as computed (top), and using the polynomial fits of 
Eqs. 9 and 10 (bottom), as a function of n. In each panel the S values of 
the shown curves vary from 1 to 5 as annotated. While the approximation 
is good for low to moderate confidence levels (5 < 3.5) it fails for S > 4 
where errors approaching and exceeding 10% are observed for small values 
of n. 
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Figure 7. Run of 5 of Eq. 1 1 as a function of S, the equivalent number of 

Gaussian a. The bullets mark the locations at which <5 (5) was computed, 
the sohd hne marks a polynomial fit to the data (see text for details). 
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Figure 8. Percentage error of the approximations of Eq. 11 with 13 (S), 
7 (S), and S (S) as computed (top), and using the polynomial fits of Eqs. 9, 
10, and 12 (bottom) as a function of n. In each panel the S values of the 
shown curves vary from 0.5 to 5 as annotated. For all values of n (except 
n = 1) and S explored here the error remains below the 1% level. 



